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ABSTRACT 

How stable is the performance of your flash-based Solid 
State Drives (SSDs)? This question is central for database 
designers and administrators, cloud service providers, and 
SSD constructors. The answer depends on write-amplification, 
i.e., garbage collection overhead. More specifically, the an¬ 
swer depends on how write-amplification evolves in time. 

How then can one model and manage write-amplification, 
especially when application workloads change? This is the 
focus of this paper. Managing write-amplification boils down 
to managing the surplus physical space, called over-provisioned 
space. Modern SSDs essentially separate the physical space 
into several partitions, based on the update frequency of the 
pages they contain, and divide the over-provisioned space 
among the groups so as to minimize write-amplification. 
We introduce Wolf, a block manager that allocates over¬ 
provisioned space to SSD partitions using a near-optimal 
closed-form expression, based on the sizes and update fre¬ 
quencies of groups of pages. Our evaluation shows that Wolf 
is robust to workloads change, with an improvement factor 
of 2 with respect to the state-of-the-art. We also show that 
Wolf performs comparably and even slightly better than the 
state of the art with stable workloads (over 20% improve¬ 
ment with a TPC-C workload). 

1. INTRODUCTION 

Solid State Drives (SSDs), based on flash chips, are now 
the secondary storage of choice for data intensive applica¬ 
tions. Database systems can now rely on high performance 
SSDs to store log, indexes and data either on servers or 
in the cloud. While SSDs provide increasingly high per¬ 
formance out of the box, maintaining high throughput and 
low latency as the utilization of SSDs increases and despite 
abrupt changes in the workload remains a challenge. 

On hash chips, data is organized into pages, each of which 
typically comprises tens of kilobytes of data. Pages reside in 
erase blocks. Before a page is updated, the underlying block 


must be erased. A software layer called the Flash Trans¬ 
lation Layer (FTL) manages these constraints by updating 


pages out-of-place. To enable out-of-place updates, SSDs are 
given over-provisioned space: a portion of the physical stor¬ 
age space which is not exposed to applications, but used by 
the FTL to accommodate the before-images of pages. Even¬ 
tually, garbage-collection operations are required to reclaim 
space occupied by before-images. Garbage-collection oper¬ 
ations migrate live pages from victim blocks (to be erased) 
to other blocks. The hash operations generated by the 
garbage collector interfere with application lOs. Such in¬ 
terferences cause reduced throughput and variable latency. 

For instance, the NTOSorlQ, with top results in the Joule- 
Sort benchmark, exhibits high variability for the 1TB sort, 
when the Samsung 840 Pro SSDs are highly utilized. 

The extent of this overhead is captured by write-amplification, 
which is the ratio between the number of physical writes that 
occur inside an SSD to the number of logical writes issued 
by the application. A way to reduce write-amplification is 
to increase over-provisioning. Both SSD constructors and 
application designers tune over-provisioning to trade stor¬ 
age capacity for performance (see for example the note on 
how over-provisioning is used to maximize lifetime and per¬ 
formance for the Samsung 840 Pro SSDfl). 

There is a direct relationship between write-amplification 
and over-provisioning. A greater amount of over-provisioned 
space reduces the average number of live pages per block, 
and thus the average number of pages that need to be mi¬ 
grated when a block is garbage-collected. 

There is also a relationship between write-amplification 
and the internals of the FTL. It has been shown [20] that a 
way to reduce write-amplification is to design a block man¬ 
ager, within the FTL, that writes hot and cold pages on 
separate flash blocks, thereby essentially partitioning the 
SSD into several groups based on the update frequency of 
the pages they contain. Designing a block manager that 
manages data placement across such SSD partitions is still 
problematic though: 

• Existing block managers exhibit pathological behav¬ 
iors when certain types of changes in the application 
workloads occur. In the context of a database system, 
examples of such abrupt changes include offline index 

^http://sortbenchmark.org/NT0Sort2013.pdf 
'“Other ways to reduce write-amplification include compres¬ 
sion and deduplication. Both of these methods reduce the 
volume of data that needs to be stored on SSD, at the cost 
of increased CPU utilization. They are orthogonal (and can 
be combined) with our proposed contribution. 

^http://www.Samsung.com/global/business/semiconductor/minisite/ 



maintenance (which essentially forces database writes 
to swap back and forth between data pages and index 
pages), checkpointing (where writes switch between 
log and data pages), or external algorithms (that force 
large volumes of writes to temporary space). More¬ 
over, in cloud environments SSDs store various databases 
with different workload characteristics subject to abrupt 
changes in the workload [3]. We show that when a 
group of pages abruptly changes temperature, existing 
FTLs exhibit an elongated period of exceptionally high 
write-amplification as over-provisioned space is slow to 
move among groups. 

• Existing block managers do not adapt well to realis¬ 
tic workloads like TPC-C. The reason is that existing 
methods make assumptions about the relative temper¬ 
atures of pages in different groups, which may be inac¬ 
curate. The allocation of over-provisioned space based 
on such assumptions is sub-optimal. 

• There is no closed-form method for allocating over¬ 
provisioned space to different groups. Existing meth¬ 
ods are either mathematically complex or based on 
hill-climbing algorithms, which are computationally ex¬ 
pensive. 

In this paper, our contribution is twofold: 

1. We derive a mathematical expression that relates over¬ 
provisioning to write-amplification under a uniform 
workload. 

2. We introduce a new block manager to address the 
aforementioned problems, called Wolf, or Workload 
Leveller for Flash. Wolf is able to detect and quickly 
adapt to changes in workload by pro-actively reallocat¬ 
ing over-provisioned space among the groups based on 
their changing needs. It adapts better to stable work¬ 
loads by measuring the update frequencies of groups 
instead of making assumptions about them. It uses a 
novel near-optimal closed-form expression to allocate 
over-provisioned space to groups. 

The rest of the paper is organized as follows. Section 
2 explores related work. Section 3 introduces a system 
model for the analysis, as well as the simulator we later 
use. Section 4 simplifies an existing analysis of the relation¬ 
ship between over-provisioning and write-amplification with 
uniform workload, and analyses the behavior of garbage- 
collection efficiency over time in this context. In section 5, 
we introduce Wolf, a new block manager design that adapts 
better to realistic workloads, especially ones that dynami¬ 
cally change over time. We describe our evaluation of Wolf 
in Section 6 and conclude in section 7. 

2. RELATED WORK 

We discuss existing work focused on modelling write am¬ 
plification, and managing it in the context of a FTL design. 

2.1 Modelling Write-Amplification 

There have been many efforts to relate write-amplification 
and over-provisioning. Hu et al. [12] and Agrawal et al. [1] 
developed probabilistic models for write-amplification, but 
they do not fit simulation results well for all values of over¬ 
provisioning. Haas et al. m proposed a model for write- 
amplification that is directly fitted from simulation results. 


Bux et al. [5] developed two complementary theoretical mod¬ 
els for write-amplification, but they do not give a closed- 
form expression. 

In 2012-2013, several papers [SlIllSo] gave an equation 
that relates over-provisioning to write-amplification in terms 
of the Lambert w function. Stoica et al. even simplified 
this expression into one that does not rely on the Lambert 
w function. The model presented in these works fits simula¬ 
tion results nicely when the workload is stable and has been 
running for a while. 

In this paper, we derive a simpler closed-form equation 
that relates over-provisioning to write-amplification. 

2.2 Managing Write-Amplification 

In the design of the Log-Structured File System m, the 
authors noticed that when writing a stream of logical ran¬ 
dom writes sequentially in the physical address space, one 
can reduce cleaning costs by physically clustering together 
logical addresses with the same update frequency. Several 
works utilize this idea for flash in order to minimize write- 
amplification. 

Envy m proposed separating the physical space into 
equally-sized groups of erase blocks. Pages are migrated to a 
hotter or colder group based on their temperatures with the 
overall goal of equalizing the groups’ cleaning costs, defined 
as the average cost of cleaning a block multiplied by the fre¬ 
quency of cleaning operations. DAC [7] also partitions the 
physical space into equally sized groups, but the migration 
policies are different. A page is promoted to a hotter group 
when it is updated and only if the update distance to the 
last update is relatively short. On the other hand, a page 
is demoted to a colder group during cleaning. Container- 
Marking m partitions the physical space in finer-grained 
groups, and promotes pages on updates and demotes pages 
based on a statistical model. 

Recently, Stoica et al. |20j proposed a scheme, that we 
denote Frequency-based Data Placement (FDP) in the rest 
of the paper, whereby the number and sizes of groups adapt 
to the workload. In FDP, cleaning is performed indepen¬ 
dently in each group using a least-recently-cleaned (LRU) 
policy. Several methods are proposed for allocating over¬ 
provisioned space to the different groups in order to mini¬ 
mize overall write-amplification, yet no closed-form is given. 
FDP outperforms existing FTL designs theoretically and un¬ 
der realistic workloads. 

In this work, we propose Wolf, which improves on FDP [20] 
in several ways. It adapts better to realistic workloads since 
the update frequencies of groups are continuously measured 
and adapted. It avoids pathological behabiors that FDP 
is subject to when application workloads change. It uses 
a greedy cleaning policy in groups, as we identified scenar¬ 
ios in which the LRU policy is suboptimal. Finally, it uses 
a simple closed-from expression to decide how to allocate 
over-provisioned space to the different groups. 

3. SYSTEM MODEL 

We rely on the following SSD model throughout this pa¬ 
per. There are B pages in a block, and K blocks in the 
SSD. The SSD is partitioned into a number of logical units 
called LUNs, on which flash operations can occur in parallel. 
Several LUNs are connected to the SSD controller through 


Parameter 

Default Value 

Ghannels 

4 

LUNs per channel 

2 

Blocks per LUN 

1024 

Pages per block (B) 

128 

Page size 

16 KB 

LBA / PBA 

70% 


Table 2: Default simulation parameters 


Notation 

Description 

B 

Number of pages in a flash block 

5 

The fraction of pages in a flash block that 
are on average migrated in one garbage col¬ 
lection operation 

LBA 

Logical address space size in flash pages 

PBA 

Physical address space size in flash pages 

OP 

The number of over-provisioned flash 
pages. Note that OP = PBA — LBA 

K 

The number of blocks in the SSD. Note 
that K = 


Table 1: System Model 


channels. Communication between the controller and the 
LUNs through the channel is serial. 

The size of the logical address space in pages is LB A, 
and the number of physical pages is PBA. The amount 
of pages for over-provisioned space is OP = PBA — LB A. 
Over-provisioning is captured by the ratio LB A/PBA. The 
notations used for the analysis are summarized in Table [T] 

We assume the FTL uses a page-level mapping scheme 
to handle out-of-place updates. This means there are LBA 
mapping entries, and when a page is updated, the physi¬ 
cal address corresponding to the page’s logical address is 
changed. The mapping table is either stored in the host’s 
RAM or on flash memory m- 

In our baseline block manager, a page update is written to 
some non-busy LUN that has a block with free pages. As the 
LUN runs out of blocks with free space, cleaning operations 
are triggered on it to clear space for new writes. Garbage- 
collection works independently for each LUN. It selects a 
victim, migrates any live pages to other blocks (which may 
be on other LUNs), and finally erases the block. The block 
is then returned to the pool of free blocks for that LUN. 

The garbage-collection policy within a LUN can be LRU 
or greedy. The LRU policy targets the block that was erased 
the longest time ago. The greedy policy keeps track of the 
number of live pages in each block and targets the block 
with the least number of live pages. 

We use the SSD simulator EagleTree [S] to validate an¬ 
alytical results and evaluate system designs. EagleTree is 
an open-source and extensible simulation framework for the 
entire 10 stack running in virtual time. We used the above- 
mentioned policies and the default parameters in table [ 2 ] 
for all simulations unless otherwise mentionecQ. We imple¬ 
mented Wolf on top of the existing simulator, available on 
CithulQ. 


4. WRITE-AMPLIFICATION 

^The default settings result in an SSD of 16GB. We show 
throughout the paper that increasing SSD capacity (i.e., the 
number of channels, the number of LUN per channels, the 
number of blocks per LUN or page size) either has no impact 
on our analysis, or that it actually amplifies the benefits of 
our approach. This setting is thus conservative and it allows 
us to conduct a thorough exploration of the design space in 
simulation. 

"https://github.com/ClydeProjects/EagleTree 

We now examine the relationship between over provision¬ 
ing and write-amplification. We focus on a uniformly dis¬ 
tributed random workload, where an application write tar¬ 
gets any logical address with an equal probability. 

4.1 The lifetime of a block 

As a first step, we model the lifetime of a block. Suppose 
we have just finished writing a block. How many pages G 
do we expect to remain valid after X application write^3? 

Since the workload is uniform, the probability that any 
page update targets some page in the block is Thus, 

the number of page updates before some page is invalidated 
in B is geometrically distributed with mean . In other 
words, page updates occur on average before the first 
page in the block is invalidated. After this event, there are 
B — 1 live pages in the block. Analogously, the expected 
number of page updates before the next page in the block is 
invalidated is The third page is invalidated after an 

expected number of writes, and so on. The expected 
number of page updates until there are no live pages in the 
SSD can be simplified using Euler’s approximation for the 
sum of the harmonic series up to n elements, where 7 is the 
Euler-Mascheroni constant. 


LBA 
—f;- + 


LBA 
B - 1 


-b...-t-- 


LBA 



LBA(\n{B)+-i) 


More generally, suppose that after X updates there are on 
average G live pages left in the block. It is easy to see that: 


X = LBA 


LBAJ2 


1 

i 


= LBA{ln{B) -by) - LBA{ln{G) + 7) 
= LBAln{B/G) 


Expressed in terms of G, we get: 


( 1 ) 


G = B- (2) 

4.2 Equilibrium 

We now study the relationship between over-provisioning 
{LBA/PBA) and the average fraction of migrations per 
cleaning operation (d). 

We start by asking a simple question: what is the expected 
number of cleaning operations throughout the SSD between 
two times that the same block is cleaned? Under the LRU 

®We assume that LBA is much bigger than B, which means 
the likelihood that a page in the block was invalidated before 
we finished writing the block is negligible. Thus, we assume 
all pages in this block are initially valid. 
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Figure 1: How LBA/PBA affects <5 (left axis) and write 
amplification (right axis) in equilibrium, assuming a uniform 
workload. 


policy, the number is K by definition. Under the greedy 
policy, the number is K on average. 

Let us now ask how many application page updates are ex¬ 
pected between two times the same block is cleaned. By the 
assumption of being in equilibrium, each cleaning operation 
involves B ■ S migrations. Conversely, each cleaning opera¬ 
tion clears space for B • (1 — 5) new application writes. Since 
K blocks are cleaned on average between two times that the 
same block is cleaned, and since each of them accommodates 
for B ■ (1 — 5) application writes, then K ■ B ■ {1 — S) page 
updates occur on average before two times that the same 
block is garbage-collected. We substitute this expression for 
X in equation [T] 


LBA ■ ln(-^) = K -B-il-S) 

B • d 

Plugging in for K, simplifying and rearranging, we 

get: 


LBA _ 5-1 
PBA ^ ln(<5) 


(3) 


We can relate 5 to write-amplification using the equa¬ 
tion WA = 1/(1 — 5). This is a characterization of write- 
amplification at equilibrium. 

Figure [T] plots equation |3l In appendix lAl we show how 
to express equation [3] in terms of 5. By so doing, we also 
show that this equation is in fact equivalent to the findings 
of previous analyses provided in [HEolE]. The interesting 
part of our analysis is that the derivation and form of the 
equation are simpler. 


5. WOLF: WORKLOAD LEVELLER FOR 
FLASH 

We have seen how write-amplification is linked to over¬ 
provisioning for random writes uniformly distributed across 
the whole SSD. Now consider that the logical address space 
is partitioned into n groups denoted as gi, g 2 ,Qn, that 
each group has a different size si, S 2 ,..., Sn measured in flash 
pages, and that each group may have a different update 
frequency defined as the probability that an incoming 
application update targets a page in group x. 

Previous works [211 120) have shown that storing 

pages with different update frequencies in different groups of 
physical blocks leads to reduced write-amplification in this 
context. Existing approaches thus conceptually partition 
the SSD into n virtual SSDs. 


The problem is then to allocate the over-provisioned space 
across virtual SSDs in a way that minimizes write amplifica¬ 
tion. Existing solutions do not always adapt well to realistic 
workloads, and they exhibit poor performance under certain 
types of workload changes that are typical of database work¬ 
loads. In this Section, we introduce Wolf, a block manager 
designed to deal with these issues. 

But first, let us describe the problems of existing ap¬ 
proaches in more detail. We base our study on the block 
manager from Stoica et al. [20] . that we denote FDP, which 
has been shown to dominate the state-of-the-art. The groups 
in this scheme have a fixed order, and it is assumed that the 
ith group contains hotter pages than the (i — l)th group. A 
page is promoted/demoted to a hotter/colder group on an 
update/migration if it is deemed too hot/cold relative to the 
other pages in its group. 

Problems arise if the update frequency of a group abruptly 
changes. The essential problem is that over provisioned 
space is slow to move among groups, which leads to a po¬ 
tentially long period of suboptimal write-amplification. 

For example. Figure [6] (in Section 6) illustrates a scenario 
using FDP where we initially have two groups gcoid and ghot 
where Scoid = Shot, Pcoid = 10% and phot = 90%. At some 
point, the update frequencies of the groups are abruptly 
swapped. Pages in gcoid are now hot and flow into ghot very 
quickly as they are updated. On the other hand, pages orig¬ 
inally in ghot are now cold, and they reside in blocks that 
occupy a lot of over-provisioned space. Ideally, these pages 
should be quickly demoted and compacted into gcoid in order 
to free the underlying over-provisioned space, so that it can 
now benefit the pages that turned hot. However, there is no 
mechanism to do this quickly. The over-provisioned space is 
only freed as the original blocks in ghot are cleaned, and the 
amount of time this takes depends on the garbage-collection 
policy. Indeed, in figure[6l write-amplification after the tem¬ 
perature swap is slow to converge again. 

The question we ask is thus the following: how can we fa¬ 
cilitate a speedy reallocation of over-provisioned space among 
groups in order to minimize overall write-amplification as 
fast as possible? 

The scheme we propose. Wolf, addresses these concerns. 

It is able to quickly detect changes in workload and quickly 
move over-provisioned space among groups in response. A 
key design principle is that when a change in workload oc¬ 
curs, physical space rather than logical pages (as in existing 
schemes), should move among groups. Wolf is different from 
existing schemes in the following ways. 

It continuously records run-time statistics about the sizes 
and update frequencies of groups. These statistics serve 
three purposes. (1) They allow adjusting the relative order¬ 
ing of groups as the workload changes. (2) They allow de¬ 
termining when it is worthwhile creating or merging groups 
as the workload changes. (3) They serve a reliable input 
to determine how to reallocate the over-provisioned space 
among groups as the workload changes. 

Wolf uses a novel near-optimal closed-form method to 
compute how to allocate over-provisioned space among the 
groups. Compared to existing methods, the closed-form 
method constitutes a good compromise between cheapness, 
accuracy and simplicity. 

Wolf continually identifies groups with excess over-provisioned 
space relative to the optimum, compacting pages in those 
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groups on fewer physical blocks, and donating the redeemed 
blocks to groups with a deficit of over-provisioned space. 
This is achieved through movement operations. Within a 
group, Wolf uses the greedy garbage-collection policy since 
our analysis showed that it significantly outperforms the 
LRU policy under certain workload changes. 

We first describe the key data structures and skeleton sub¬ 
routines of Wolf. We then describe its novel aspects in more 
detail. 

5.1 Algorithm Outline and Key Data Struc¬ 
tures 

Wolf’s skeleton is structured around three key subrou¬ 
tines, and the notion of intervals of application writes. The 
first subroutine handles write operations and updates statis¬ 
tics about each group within an interval. The second sub¬ 
routine handles the completion of an interval. In this sub¬ 
routine, updated statistics about groups’ sizes and update 
frequencies are used to (1) re-sort the relative order of group, 

(2) consider whether or not to create or merge groups, (3) 
recompute the optimal allocation of over-provisioned space 
among groups, and (4) consider moving over-provisioned 
space among groups. The third subroutine handles an erase 
operation. It assigns a redeemed block to a group with an 
over-provisioned space deficit, and considers triggering more 
over-provisioning reallocation operations. We now explain 
these subroutines in more detail. 

Listing [T] shows how writes are handled. When a write 
arrives, we determine which group it currently belongs to 
by (1) looking up the corresponding physical address in the 
usual logical to physical mapping table (orthogonal to Wolf), 1 
and then (2) consulting a data structure called the Blocks 2 
to Groups Map (BGM), which keeps track of which physical 3 
blocks belong to which group. Now that we know the page’s 4 
current group, we consult a temperature detector module 5 
(TD) to determine if the page should remain in its current 6 
group or be demoted/promoted to a colder/hotter group. 7 
When we have a target group on which to write the page, 8 
we find a free page in the group and execute the write. Fi- 9 
nally, we update statistics about the sizes of groups and the 
number of writes targeting the groups per interval. The 
routine concludes by checking if the target group is nearly 
out of free pages, in which case garbage-collection is invoked 
within it. Non-trivial parts of all subroutines are marked in 
red and described in more details in later sections. 


In principle, the length of an interval, denoted as h, cap¬ 
tures a trade-off between responsiveness to changes in work¬ 
load and the computational expense of subroutine [S] For¬ 
tunately, subroutine [2] is not expensive since the component 
that would usually be expensive in it, namely the method 
for recomputing the allocation of over-provisioned space, is 
non-iterative and therefore extremely cheap. Thus, we can 
set h to be small (in our implementation h = LB A ■ 0.00 £ 1 ) 
Let us now describe subroutine [2] In lines 2 and 3, each 
group’s update frequency is updated based on the propor¬ 
tion of writes that targeted that group within the interval 
that finished. Note that o is a constant that controls the 
weighting of a group’s long-run measured update frequency 
versus the update frequency in the interval that just finished 
(we set ato h-3). In line 4, the allocation of over-provisioned 
space for a group is recomputed. 

In line 7, we sort a data structure called the Sorted Groups 
Vector (SGV) based on updated statistics. This data struc¬ 
ture maintains the relative order of groups based on the 
notion of hit rate, which we define for group as Pxjs^, 
its update frequency over its size. As we will later see, this 
structure is used by the temperature detector, and by the 
policy that decides whether to create or merge groups. In 
lines 8 and 9, we trigger routines that consider creating or 
merging existing groups, and whether or not to start reallo¬ 
cating over-provisioned space among group. 

Listing 2: handle interval completion 

for (group X in GROUPS) { 

U = group.intervaLwrites / h 
p_x = p_x * (1 — a) -I- a U 
group.over_prov = allocate_over_prov(p_x, s_x) 
group.intervaLwrites = 0 

} 

SGV = sort_byJiit_rate(GROUPS) 
merge_or_create_groups () 
consider _movement_operations 0 

The last subroutine of Wolf handles erase operations. It 
identifies the group with the greatest over-provisioning de¬ 
ficiency and assigns the erased block to it. After this, we 
consider reallocating more over-provisioned space by trig¬ 
gering additional movement operations. 


Listing 1: handle write 

input: write 

logi_addr = write.logi_addr 

phys_addr = MT[logi_addr] 

blockjd = phys_addr.block_id 

groupjd = BGM[block_id] 

curr .group = GROUPS [groupJd] 

new_group_id = TD.find_target(group_id, logi_addr) 

new_group = GROUPS [new_group_id] 

new_phys.addr = newsgroup.hndJ'ree_addr() 

write.new_phys.addr = new_phys_addr 

flash, schedule(write) 

curr .group, size- 

new.group. size-1- -1- 

if (write is not a garbage—collection operation) 
new.group.interval.writes-|--|- 
if (new.group.numJree.pages < B) 
new.group.garbage.conect () 
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Listing 3: erase 

flash. schedule (erase) 

Hnd group x that maximizes (sjc, OPjc) 
blockjd = erase.blockJd 
BGM [blockjd] = group.id 
consider .movement.operations 0 

With Wolf’s skeleton in mind, let us reiterate why it 
adapts better to changes in workload. Firstly, Wolf de¬ 
tects changes in workload through statistics. It is highly 
responsive to changes in workload since intervals are short. 
As groups change in update frequency relative to each other, 
their order is re-sorted, and the allocation of over-provisioned 

^The parameters we dehne in this Section remained Hxed 
throughout the experiments presented in Section 6. In this 
paper, we do not present a thorough exploration of the per¬ 
formance impact of these tuning parameters. This is a topic 
for future work. 









space among them is recomputed and pro-actively adjusted. 
This is in sharp contrast to existing schemes, whereby the 
ordering of groups is fixed, and write-amplification is con¬ 
trolled by moving pages among the groups, as opposed to 
by moving over-provisioned space among them. 

An additional advantage of Wolf over existing schemes is 
that its statistics provide a more accurate input to the over¬ 
provisioning allocation strategy. FDP makes assumptions 
about the update frequencies of the groups based on their 
relative order, but does not measure them. Thus, Wolf is 
able to adjust better to a stable workload. 

The next sections explain non-trivial parts of Wolf, marked 
in red in the above subroutines. In section we de¬ 
scribe the policy for creating and merging groups. In section 
15.31 we describe the policy for reallocating over-provisioned 
space among groups using movement operations. In section 
EH we discuss the garbage-collection policy within groups, 
and in section ES] we describe the novel closed-form method 
for a near-optimal allocation of over-provisioned space among 
groups. Finally, in sections |5.6ll5.7l and l5.8l we discuss com¬ 
patibility with orthogonal concerns of temperature detec¬ 
tion, parallelism and wear-levelling respectively. 

5.2 Groups Creation And Merging Policies 

Let us describe the policy for creating or merging groups 
invoked in subroutine E Wolf is initialized with a minimum 
of two groups. A page that is written for the first time is 
always assigned to the coldest group. A hotter additional 
group is created when the following conditions hold: (1) the 
group with the highest hit rate in SGV must have at least 
F pages, and (2) the ratio between the hit-rates of the two 
hottest groups in SGV must be at least Q. Rule 1 is meant 
to avoid the creation of an excessive number of very small 
groups, and rule 2 is meant to ensure that the creation of a 
new group is motivated by a real temperature skew in the 
workload. 

We set the constant F in rule (1) to be the number of LUNs 
in the SSD multiplied by the number of pages in a block. 
Thus, each group has at least one block in each LUN. This 
ensures natural load balancing across the parallel architec¬ 
ture of the SSD. 

An implication of rule (2) is that the hit rate of groups 
increases exponentially for a stable workload. Indeed, m 
showed analytically that the update frequency ratio among 
pages in the same group may vary by up to 2 without in¬ 
curring any significant penalty. Thus, we set the constant 
Q in rule (2) to two. Note that the exponential increase in 
the hit rates of groups means we can handle very skewed 
workloads with relatively few groups. 

When a new hot group is created, it takes time for pages 
to flow into it and for its long-term update frequency Px to 
adjust and stabilize. We give it time by fixing its position in 
the SGV as well as banning the creation or merging of any 
additional hot group for the next w intervals (w = 50 in our 
implementation). 

If the update frequency of two adjacent groups in SGV 
diverges and the hotter one is hotter by a factor of more 
than Q ■ 2, we create a new empty group in the middle, again 
fixing its location for w intervals to allow it to stabilize. 

If two adjacent groups in SGV converge in terms of hit 
rate for over w intervals, they are merged. Moreover, if 
the number of pages in a group drops below F, we merge 
the group with one of its adjacent groups in SGV. Note 


that a merge is logical, as it only involves consolidating the 
metadata about the groups. 

The implication of our groups’ creation and merging pol¬ 
icy is that when a workload is highly skewed, the number 
of groups would tend to oscillate. This is indeed what we 
observe with a TPG-C workload in section ED 

This policy improves on FDP in the sense that FDP does 
not have a policy for merging groups, or to control that the 
creation of a new group is motivated by a genuine skew in 
the workload as opposed to randomness. Thus, the number 
of groups in FDP tend to grow over time. 

5.3 Movement Operations 

In both subroutines [2] and E a procedure is called to 
consider reallocating physical space among the groups. It 
work by scanning the metadata for each groups. It identi¬ 
fies groups that have a block-surplus relative to the dictated 
amount of over-provisioned space the group should have. 
It then triggers garbage-collection operations within such 
groups. The redeemed blocks are donated to groups with a 
block-deficiency relative to the dictated allocation, as shown 

in subroutine E 

There is an interesting cost-benefit question regarding how 
greedily to trigger movement operations. On the one hand, 
movement operations constitute a clear cost in terms of mi¬ 
grations. On the other hand, movement operations are an 
investment in future performance, as they help reducing the 
overall write-amplification. Thus, an interesting question 
is how rapidly to issue movement operations. We investi¬ 
gated different strategies of pacing movement operations, 
but found that issuing movement operations as greedily as 
possible with no restrictions always minimized the overall 
number of migrations. This indicates that the investment 
in movement operations is always worthwhile. 

In contrast to Wolf, in FDP a group may only be donated 
blocks from the two groups adjacent to it, namely the next 
colder and next hotter groups. This restricts the adaptivity 
of the scheme to changes in workload, as a block may need to 
cross multiple groups to reach its target. In contrast. Wolf 
has the flexibility of being able to move blocks between any 
groups. 

5.4 Cleaning Policy within a Group 

A garbage-collection operation is triggered in a group when 
it has less than B free pages. The question we now address 
is how to choose a victim block. We compare two policies: 
LRU and greedy. 

Previous work HSl showed that the difference between the 
greedy and LRU policies in equilibrium is not great (though 
it does increase slightly for higher values of LBA/PBA). As 
LRU is simpler to implement, state-of-the-art FTLs such as 
FDP [20] use the LRU policy. However, the heuristic of 
the LRU policy, namely that the block that was cleaned 
the longest time ago contains the fewest live pages, may 
sometimes underperform, particularly due to movement op¬ 
erations, which were introduced in the last section. 

Movement operations pack the pages of groups with block- 
excess more compactly on fewer blocks. Suppose that many 
of them occur rapidly in a group relative to the size and 
update frequency of the group. By the time they are fin¬ 
ished, a significant number of blocks in the group will be 
completely full. In such a scenario, the block that has been 
least recently erased holds statistically the same number of 
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Figure 2: The greedy vs. LRU policies 

live pages as the other blocks in the group, so the LRU 
heuristic fails. 

To demonstrate this, we made a micro-experiment using 
Wolf whereby the workload is divided into two groups gi 
and g 2 where si = S 2 , Pi = 100% and P 2 — 0%. Initially, 
gi has most of the over-provisioned space and g 2 has almost 
none. We then abruptly swap the groups’ update frequen¬ 
cies. Movement operations are issued, rapidly compacting 
the pages of group 1 into few physical blocks and moving 
the redeemed blocks to group 2. This creates a vast number 
of filled blocks with the same number of live pages in group 
1. We then swap the update frequencies of the groups again. 
Cleaning operations are triggered in group 1, but since the 
blocks in this groups have the same number of live pages, the 
LRU policy is unable to choose a block with exceptionally 
few live pages. Write-ampliheation after the second swap 
is shown in Figure [2] Approximately 15% more migrations 
take place in the LRU policy compared to the greedy policy 
until write-ampliheation converges. 

Traditionally, the argument used against the greedy policy 
has been that its computational cost is high, particularly 
the process of choosing a victim. However, victim selection 
does not need to consider all blocks in the SSD. It is enough 
to consider only the blocks in a particular group within a 
particular LUN. As there are usually tens of LUNs in an 
SSD, this narrowing of the search allows reducing the cost 
of victim-selection by an order of magnitude. 

5.5 Over-Provisioning Allocation 

We now describe the closed-form method for determining 
how to allocate over-provisioned space among groups. Let 
OPx be the over-provisioned space given to group x. The 
total amount of physical space consumed by group x is (sa, -I- 
OPx)/B physical blocks. Let 5^ be the fraction of pages 
that need to be migrated per cleaning operation in group 
X. Using equation [3l we can relate Sx, OPx and Sx for any 
group x: 


Sx OPx ln((l3;) 

Let us denote WA{sx,OPx) as the write-ampliheation of 
group X, which is a function of the size and over-provisioned 
space allocated to that group. Overall write-ampliheation 
is the sum of the write-ampliheations of all groups weighted 
by their update frequencies. 


WA = ^Pi-WA(si,OPi) (5) 

i=l 


Allocating the over-provisioned space among the groups so 
as to minimize write ampliheation is an optimization prob¬ 
lem. Although [201 |9] proposed methods for hnding the 
optimal allocation, no closed-form method was proposed. 
Moreover, methods explored in these works are either ex¬ 
pensive (e.g. relying on iterative optimization algorithms), 
of questionable accuracy (polynomials of best ht that hold 
the relative update frequencies of groups as a constant), or 
of questionable adaptability to workload changes (i.e. they 
only support movement of blocks between adjacent groups). 

It is important that the over-provisioned allocation strat¬ 
egy should be both accurate and cheap to compute because 
it should be invoked frequently, especially with a changing 
workload. We now present a novel closed-form approach. 

Our key insight is that the optimal allocation of over¬ 
provisioned space exists on a plateau dehned by a few key 
points in the optimization space. We find two such points by 
considering two hypothetical suboptimal policies and then 
merging their solutions. The first policy assigns over-provisioned 
space to groups only based on the group’s size, and the other 
assigns over-provisioned space only based on the group’s up¬ 
date frequency. We can then approximate the location of the 
optimum based on these two key points. 

5.5 .1 Considering just Size 

The first policy performs garbage-collection greedily across 
groups. It always picks as a victim the block with the least 
number of live pages in the SSD, regardless of which group 
it belongs to. A redeemed block is allocated randomly to 
any group that has less than B free pages. 

Let us reason about equilibrium in this kind of system. 
Suppose there are two groups <71 and <72 with different clean¬ 
ing efficiencies such that < 62 - The cleaning policy will 
only target blocks from group gi since that’s where the 
cheapest blocks to claim are. However, writes are still tak¬ 
ing place in group ( 72 . As (72 runs out of space, some of the 
blocks garbage-collected from group <71 will flow to group 
( 72 . This will increase and decrease S 2 , until they equal¬ 
ize. At this point, the blocks from both groups will be just 
as expensive to garbage-collect. 

This equalization principle holds regardless of the number 
of groups, their sizes or their update frequencies. Eventually, 
cleaning efficiency among all groups equalize such that: = 

S 2 = ... = Sn- This also means that the amount of over¬ 
provisioned space in each group reaches a fixed point (it may 
oscillate, but the cleaning policy will cause it to converge 
again). So, even though this is an open system, whereby 
blocks can flow among groups, in equilibrium the amount of 
over-provisioned space in each group can be assumed to be 
fixed, and so each group can be analysed as a closed system. 

By equation 01 the fact that cleaning efficiency for all 
groups in equilibrium is equal implies that the following 
holds: 


Si _ *2 _ _ Sn 

Si — OPl S2 — OP2 Sn — OPn 

It is therefore easy to see that for any group gx'- 

Sx _ Si -j- ... -f Sn _ LBA 

Sx — OPx Si -f ... -f Sn OPl -f ... -f OPn PBA 

With some rearranging, the the exact amount of over¬ 
provisioned space allocated to group x in equilibrium is the 















0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 

Over-provisioning division point 


Figure 3: The write-amplification optimization space for 
several different 2-modal workloads. The fraction from 0 to 
the blue dot is the amount of over-provisioned space given 
to group 1, and the rest is given to group 2. The green 
and black dots corresponds to the point of division based 
on update frequency or group size alone. The blue dot is 
their average. 

following expression, which only depends on its size. We 
simplify by letting V = - l). 


5.5.2 Considering just Update Frequency 

The second hypothetical policy we consider assigns over¬ 
provisioned space directly based on each group’s update fre¬ 
quency. It is solely based on the observation that a group 
with a higher update frequency should have more over-provisioned 
space. Thus, the overall over-provisioned space is parti¬ 
tioned to groups based on each group’s update frequency, 
and completely disregarding their sizes. 

OPx = Px ■ OP 

= Px-{V-^ + l) 

Note that this policy directly fixes the amount of over¬ 
provisioned space in each group, whereas in the previous 
one the amount of over-provisioned space per group was 
determined indirectly through a convergence process. 

5.5.5 Mixing the expressions 

Let us now combine the above two policies. Suppose we 
wish to find a near-optimal allocation of over-provisioned 
space for the groups. We get two different values from equa¬ 
tions [6] and [7] The actual optimum must lay somewhere in- 
between those values. In fact, it turns out that their average 
is extremely close to the optimum: 

OPx = + (8) 

This is shown for several 2-modal workloads in Figure [S] 

The red line shows how the division-point of the over-provisioned 
space among the groups affects overall write-amplihcation. 

The dots along the line give the division-point given by the 
3 different policies. 

We evaluated formula [8] through a brute-force exploration 
of the workload space. We partition the logical address space 
and update frequency space into Q equally sized chunks of 
sizes LBA/Q and 1/Q respectively. Each group gets at least 
one chunk of the logical address space and update frequency 
space. Under these constraints, we test all possible unique 
workload configurations. We vary the number of groups 
from 2 to 9. We repeated all experiments for 4 different 
over-provisioning values, 0.6, 0.7, 0.8 and 0.9. Note that Q 
controls how skewed the workload can be. The bigger it is, 
the more extreme the differences between groups in terms of 
update frequencies and sizes can be. We used two different 
values of Q, 10 and 20. All in all, our exploration captured 
a little over a million different workload configurations. 

Note that the space of workloads covered in our explo¬ 
ration covers a TPC-C workload. Figure |5] shows an update 
histogram of the core data of a TPC-C workload. We see 
that pages are divided into two clusters in terms of temper¬ 
ature. The hotter cluster’s peak is approximately 8 times 
hotter than the other cluster, and both clusters are similar 
in size. It is easy to see that the above brute-force approach 
covers such a workload conhguration. We demonstrate this 
empirically in the evaluation. 

The baseline we used to compare the closed-form method 
is against is the hill-climbing algorithm similar to the one 
given in [^, which always finds the optimum because the 
optimization space for write-amplification is convex. 
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OPx = s,. • f -l\=Sx-V (6) 
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Figure 4: The average and maximum percentages off the 
optimal for all configurations with certain number of groups, 
fn these experiments, LB A/P BA = 0.7. 



Figure 5: The average and maximum percentages off the 

optimal for all configurations with certain values of LBA / 

PBA. The number of groups is fixed to 5. 

Trends from this analysis are displayed in Figures |4] and 
[S] Both graphs show the average and maximum percent¬ 
age differences of our policy from the optimum for the two 
different values of Q for every workload configuration tried. 
On average, our policy is below 1% of the optimal for all the 
workload configurations. The maximum departures from 
the optimal are between 2% (when Q=10) and 9% (when 
Q=20). Put differently, the closed-form we propose works 
well unless for very skewed workloads where a group is either 
very hot or very cold. 

In order to take such skew into account in practice, we 
provide special treatment to the coldest group. When the hit 
rate of the coldest group is less than a certain fraction of the 
second coldest group, set to 5% in our implementation, we 
allocate a fixed over-provisioned space to the coldest group, 
in our implementation 5% of the smallest group’s logical 
size. For the remaining groups we allocate over-provisioned 
space using the closed-form method. This works in practice 
well for the TPC-C Workload, whereon a high percentage of 
the data is extremely cold relative to the rest. 

5.6 Temperature Detection 

A key component of subroutine [T] used to handle writes is 
to determine if the logical page should remain on the same 
group, or be demoted or promoted. A page should be pro¬ 
moted or demoted only if its update frequency changes rel¬ 
ative to the group. Wolf uses one independent temperature 
detector for each group. If the temperature detector deems 
a page in group i relatively hotter or colder than its group, 
it migrates it to groups i-l-1 or i-1 respectively. 

Temperature detectors are subject to a trade-off between 
bookkeeping overhead and accuracy, and implementing a 


detector that strikes a good balance between these aspects 
is non-trivial. In our implementation, we used a detector 
inspired from [18]. It consists of 2 bloom filters, one active 
and one passive. When a page is written to a group, its 
logical address is inserted into the active filter. Each group 
has independent write intervals, each of which is set to be 
as long as the number of pages in the group. At the end of 
an interval, the passive bloom filter is erased, the active one 
becomes passive, and a new active filter is created. This new 
bloom filter is initialized with 0.3 as the false probability rate 
and the projected number of elements is set to the current 
size of the group. Thus, each page in the SSD needs 5 bits 
in RAM for both bloom filters, which is not a significant 
overhead. The filters are used as follows. If the address of a 
page update exists in both active and passive filters and is an 
application update, the page is promoted to the next hotter 
group. If the address is in neither bloom filters and the 
write is a garbage-collection migration, the page is demoted 
to the next colder group. Otherwise, the page is updated in 
its current group. Designing a better temperature detector 
for Wolf is a topic for future work. 

5.7 Garbage-Collection and Parallelism 

Wolf exploits the SSD’s available parallelism and achieves 
load-balancing by partitioning each group equally among all 
LUNs. It does so by maintaining free space for every group 
in every LUN, similarly to [Bj. We introduce the notion 
of a subgroup to refer to the blocks of given group that 
are on a particular LUN. Wolf’s garbage-collection policy 
issues a garbage-collection operation in a subgroup as soon 
as the number of free pages in the subgroup falls below B. 
The manner in which a victim is chosen is described in sec¬ 
tion (Ell In terms of scheduling, a write targeting group 
is scheduled on some subgroup of that has free space 
and whose LUN is currently non-busy. During garbage- 
collection, the pages that are migrated can be written on 
any subgroup of the target group. 

5.8 Wear-Levelling 

The specifics of wear-levelling are orthogonal to the design 
of Wolf. Dynamic wear-levelling can take place by reassign¬ 
ing old blocks from hot groups to young groups to decelerate 
their wear, and young blocks from cold groups to hot groups 
to accelerate their wear. Static wear-levelling can take place 
in the coldest group to free exceptionally young blocks and 
assign them to hot groups. 

6. EVALUATION 

Let us evaluate Wolf against the sate-of-the-art. As | 20 |. 
we rely on simulation for our evaluation. We thus ignore 
the impact of error correction and bad block management 
on write-amplification, and the possible RAM limitations of 
a given SSD model. Rather, we focus on the impact of over¬ 
provisioned space utilisation on write-amplification with a 
simulator that reflects the internal structure of an actual 
SSD. We are in contact with a SSD constructor to proceed 
to a validation of our simulation results. 

The recent scheme in FDP [20] has been shown to dom¬ 
inate existing block managers that strive to separate pages 
based on their update frequencies to reduce write-amplification, 
so it will serve as the main frame for comparison. We imple¬ 
mented in EagleTree the data allocation mechanisms from 




























Figure 8: Garbage-collection performance as we swap the 
update frequencies of two groups after 5 million writes. 
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Figure 6: How write-amplification evolves in time with 
FDP as we swap the update frequencies of two groups. The 
swap occurs at the dotted line. 



Figure 7: How write-amplification evolves in time with 
Wolf as we swap the update frequencies of two groups. The 
swap occurs at the dotted line. 

FDP, as described in |20| . We begin the evaluation by as¬ 
suming an oracle temperature detector that knows the pre¬ 
cise update probability of every page. We start by compar¬ 
ing how these schemes adapt to changes in workload. 

6.1 Frequency Swap 

In the first experiment, we examine how the schemes adapt 
to a change in the update frequencies of two groups. Groups 
1 and 2 are equally-sized and have update frequencies 10% 
and 90%, and at some point their update frequencies swap. 

Figure [6] shows the evolution of write-amplification as 
a result of a swap with FDP. As described in Section [3 
the pages originally in group 2 which turned cold reside on 
blocks that occupy a signihcant amount of over-provisioned 
space, and it takes a considerable amount of time before the 
garbage-collection algorithm targets them . Thus, write- 
amplification is high and slow to decrease again. 


In Figure 0 we see the result of a swap in Wolf. The 
change in update frequency is quickly detected, and move¬ 
ment operations aggressively take place in group 1 to reallo¬ 
cate over-provisioned space from the now cold group to the 
now hot group. Equilibrium is restored much more quickly. 

We can compare the schemes using a baseline whereby 
no swap happens. The additional number of migrations 
for Wolf and FDP respectively relative to the no-swap sce¬ 
nario and divided by PBA is 0.7% and 152.1%. In other 
words. Wolf hardly requires any additional migrations ex¬ 
cept a short spike, whereas FDP entails an additional total 
overwriting the SSD 1.5 times over. 

In order to generalize this result, we show the difference in 
performance between FDP and Wolf when the groups being 
swapped have different update frequencies. In the experi¬ 
ment, there are 5 groups with ids 0,1,2,3 and 4 that have 
exponentially increasing update frequencies of around 3.2%, 
6.4%, 12.8%, 25.6% and 51.2%, such that the sum of the 
update frequencies adds up to 100%. We swap the update 
frequencies of every pair of groups. We measure the total 
difference in migrations between FDP and Wolf and normal¬ 
ize it through division by the number of physical pages in 
the SSD. The bar chart shows that Wolf always outperforms 
FDP, with up to 2.2x performance improvements. The per¬ 
formance improvement increases with the difference between 
the groups in terms of update frequency. 

6.2 Realistic Workload 

In this section, we examine how Wolf behaves under a 
TPC-G workload. We generated an 10 trace of the TPC-G 
benchmark using Shore-MT and Shore-Kits [2]. We added 
print statements into the code in Shore-MT to collect the 
logical addresses being written to. We used a scaling factor 
of 48 to match the size of our simulated SSD. The amount 
of RAM given to the Shote-MT buffer poll was set to 5% of 
the size of the data. 

A challenge of using TPG-C for benchmarking over SSDs 
is that the size of the database grows. Of every 100 writes 
on average, 3 are to new addresses. This is a problem as 
it causes over-provisioning to change throughout an experi¬ 
ment thereby making it difficult to control and interpret. 

What is even more interesting is that the data that is 
added to the database throughout an experiment, which we 
coin TPG-Cadded, hus distinctly different workload charac¬ 
teristics than the data written during initialization, which 
we coin TPC-Ginit. Figure [9] shows the update frequency 
distribution for the logical pages in TPC-Ginit. The tem¬ 
perature of most pages in TPC-Cinit remains stable. Of the 
pages in TPG-Cadded, however, 68% are only written once 
and never updated and the remaining 32% tend to be hot 
for a short period of time and are then never updated again. 
Investigating how to cater for these different workload char¬ 
acteristics is the subject of future work. 

Here, we just focus on the updates in TPC-Cinit (47 mil¬ 
lion writes to 802816 addresses). We assume that the up¬ 
dates in TPC-Cadded (10 million writes to 1815483 addresses) 
are written elsewhere and so we remove them from the work¬ 
load. Thus, over-provisioning remains constant throughout 
our experiments, and the workload is stable: the tempera¬ 
ture of pages does not significantly change. 

How does Wolf compare to FDP under a TPC-G work¬ 
load? When investigating this question, we found two fac¬ 
tors that play a role: (i) the dehnition of groups, as their size 
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Figure 10: Comparison of two methods for allocating over¬ 
provisioned space to different groups. The red pluses and 
minuses indicate the creation or merging of groups respec¬ 
tively for the closed-form policy. 
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Figure 9: Update histogram of logical addresses in 
TPC-Cinit that have update frequencies between 1 and 300, 
which includes 46% of the pages. Of the remaining pages, 
54% are never updated and 0.2% are updated more than 
300 times. 


and update frequency changes (adaptive for Wolf vs. non- 
adaptive for FDP), and (ii) the allocation of over-provisioned 
space (closed form for Wolf vs. the iterative method from [9l 
I20| . which is theoretically optimal if group sizes and update 
frequencies are defined). In order to separate the influence 
of each factor, we ran two sets of experiments: first we var¬ 
ied group definition (both for closed form allocation), then 
we varied allocation strategy (both for adaptive groups). 

Figure [TO] shows the results of both experiments. The blue 
line corresponds to Wolf (flexible group definition, closed 
form allocation), the red line corresponds to the theoretically 
optimal iterative method (with flexible group definition), 
and the green line corresponds to PDF’s group definition 
(with closed form allocation). The red and green line allow 
us to separate the features of FDP and how they relate to 
Wolf. The grey line, used as a baseline, illustrates write- 
amplification over time if all pages are mixed in one group. 

First, our experiment confirms the significant superior¬ 
ity of data allocation schemes that separate pages based 
on update frequency over the baseline that considers over¬ 
provisioned space as a single group. 

Second, we observe that the policies corresponding to the 
red and blue lines perform similarly. The iterative policy 
involves 0.2% less migrations than the closed-form policy, 
which suggests that for a realistic workload the theoretical 
difference between the policies explored in section 15.5.31 is 
in fact small. The number of groups in Wolf oscillates be¬ 
tween 7 and 9 groups when running TPC-C. The reason 


for this is the skew of the workload. The last group tends 
to be so much hotter than the second hottest group, that 
Wolf continually creates additional hotter groups (accord¬ 
ing to the policy in l5.2l) . However, this leads to the middle 
groups eventually converging in terms of hit rate, and they 
are thus merged. The creation and merging of groups does 
not involve any significant penalty. We observed some fluc¬ 
tuations arising in garbage-collection which occur due to 
surges in movement operations (but smoothed in Figure ITOl 
for readability). Flattening such surges is the subject in fu¬ 
ture work. The mechanism we proposed in Section [5.5.31 to 
handle coldness skew kicks in with the TPC-C workload, as 
the difference in update frequency between the coldest and 
second coldest groups is two orders of magnitude greater 
than the difference in update temperature between any other 
two adjacent groups. 

Third, we observe that the red and blue lines perform 
consistently better than the green line (by a approximately 
22% in equilibrium). For the red and blue lines, the update 
frequency of groups is continuously measured and adapted 
as described in section El For the green line, we consider 
that groups have a fixed order, and that the pages in group 
i are twice as hot as pages in group i-1 and half as hot as 
pages in group i-l-1, as described in [20]. The workload skew 
explains the difference between the flexible group manage¬ 
ment from Wolf, and FDP’s group management. The prob¬ 
lem comes from FDP’s assumption that the hit rate of one 
group is within a given factor of its adjacent groups. Wolf 
overcomes this by measuring how many writes actually tar¬ 
get each group in each interval. These much more accurate 
figures of update frequencies allow the over-provisioning al¬ 
location mechanism to give different groups an allocation 
of over-provisioned space that yields slightly lower write- 
amplification. This shows that Wolf performs slightly better 
than the state-of-the-art for a realistic stable workload, while 
it outperforms it significantly when the workload changes. 

As far as we are aware, real SSD manufactures today don’t 
implement schemes like FDP or Wolf. On real SSDs, pages 
with different update frequencies are mixed on the same 
blocks. Thus, running the experiments in this subsection 
on a real SSD would result in a much higher overall write- 
amplification that would resemble the gray line in figure 
nni We hope that with the recent advent of schemes like 
FDP and Wolf, more SSD manufactures would indeed begin 
implementing such schemes. 

7. CONCLUSION 

SSD write-amplification is a key aspect of their perfor¬ 
mance, which depends on (i) how much over-provisioned 
space is made available, and (ii) how this over-provisioned 
space is allocated to different virtual partitions of the SSDs 
(groups) defined by their update frequency. In this paper, 
we presented Wolf, a FTL block manager, designed to adapt 
well to abrupt changes in workload and to the skew of re¬ 
alistic workloads such as TPC-C. Future work includes the 
deployment of Wolf in an open-channel SSD and its evalua¬ 
tion with a larger set of database workloads. 
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APPENDIX 

A. DEEPER ANALYSIS 

Through a series of simple manipulations and application 

of the Lambert W function, we can express equation 0 in 

terms of 5: 


LB A 3 - 1 
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1 C ^X 
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(9) 


This is in fact the equation derived through different means 
in 1201 [To] , so these analyses are equivalent. 












